DATA - 601¶

Analyse the Customer Behaviour On E-Commerce Platforms¶

Group #2:

Prashant Dhungana-30080130

Prateek Kaushik-30229287

Mariya Mathews-30192182

Arteen Rafiei-30043409

Nisha Pillai-30158934

Importing Important Libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Part 1. Data Import and understanding¶

In [2]:
#Data import and understanding
customer_behaviour_df = pd.read_excel("E Commerce Dataset.xlsx",sheet_name='E Comm')
customer_behaviour_df.head()
Out[2]:
CustomerID Churn Tenure PreferredLoginDevice CityTier WarehouseToHome PreferredPaymentMode Gender HourSpendOnApp NumberOfDeviceRegistered PreferedOrderCat SatisfactionScore MaritalStatus NumberOfAddress Complain OrderAmountHikeFromlastYear CouponUsed OrderCount DaySinceLastOrder CashbackAmount
0 50001 1 4.0 Mobile Phone 3 6.0 Debit Card Female 3.0 3 Laptop & Accessory 2 Single 9 1 11.0 1.0 1.0 5.0 159.93
1 50002 1 NaN Phone 1 8.0 UPI Male 3.0 4 Mobile 3 Single 7 1 15.0 0.0 1.0 0.0 120.90
2 50003 1 NaN Phone 1 30.0 Debit Card Male 2.0 4 Mobile 3 Single 6 1 14.0 0.0 1.0 3.0 120.28
3 50004 1 0.0 Phone 3 15.0 Debit Card Male 2.0 4 Laptop & Accessory 5 Single 8 0 23.0 0.0 1.0 3.0 134.07
4 50005 1 0.0 Phone 1 12.0 CC Male NaN 3 Mobile 5 Single 3 0 11.0 1.0 1.0 3.0 129.60

Part 2: Data Cleaning¶

In [6]:
#Dropping the columns which are not used in the analysis
customer_behaviour_df.drop(columns = ["CustomerID","CityTier","WarehouseToHome","NumberOfAddress","OrderAmountHikeFromlastYear"],inplace= True)
customer_behaviour_df.head()
Out[6]:
Churn Tenure PreferredLoginDevice PreferredPaymentMode Gender HourSpendOnApp NumberOfDeviceRegistered PreferedOrderCat SatisfactionScore MaritalStatus Complain CouponUsed OrderCount DaySinceLastOrder CashbackAmount
0 1 4.0 Mobile Phone Debit Card Female 3.0 3 Laptop & Accessory 2 Single 1 1.0 1.0 5.0 159.93
1 1 NaN Phone UPI Male 3.0 4 Mobile 3 Single 1 0.0 1.0 0.0 120.90
2 1 NaN Phone Debit Card Male 2.0 4 Mobile 3 Single 1 0.0 1.0 3.0 120.28
3 1 0.0 Phone Debit Card Male 2.0 4 Laptop & Accessory 5 Single 0 0.0 1.0 3.0 134.07
4 1 0.0 Phone CC Male NaN 3 Mobile 5 Single 0 1.0 1.0 3.0 129.60
In [12]:
#replacing duplicates in Preferred Order category
customer_behaviour_df["PreferedOrderCat"] = customer_behaviour_df["PreferedOrderCat"].replace("Mobile","Mobile Phone")
In [17]:
#understanding the distribution of Tenure,HoursSpendOnApp,CouponUsed ,OrderCount,DaySinceLastOrder to find what method to be used in fillna
for col in customer_behaviour_df[customer_behaviour_numerical]:
    if customer_behaviour_df[col].isnull().mean() > 0:
        plt.hist(customer_behaviour_df[col], bins='auto', density=True)
        plt.title(col)
        plt.show()

Part 3: Data Analysis and Visualization¶

Customer Demographic and Preferences

● To what extent does the time spent on the app relate to how frequently customers make purchases (Order count)?

In [20]:
 

● Is there a notable difference between app usage and purchase frequency, and does this differ significantly between male and female customers?

In [21]:
 

● How does the choice of preferred login device relate to the amount of time spent on the app? Is there a correlation between login device preference and app usage?

In [22]:
 

Customer Behavior and Engagement

● Are there any patterns forming between Preferred Order category and gender?

In [23]:
 

● Does the preferred order category have any correlation with marital status?

In [24]:
 

● Does the number of orders change based on the coupons and cashback that the customer receives?

In [25]:
 
In [26]:
 

Customer Churn and Retention

● Satisfaction score

In [27]:
 

● Complaint and Churn rate

In [28]:
 

● What is the correlation of satisfaction score and complaints with the churn rate?

In [29]:
 

Customer Satisfaction and Feedback

● Does tenure have an impact on satisfaction scores and the number of complaints raised?

In [30]:
 
In [31]:
 
In [32]:
 
In [33]:
 

Conclusion